Multilingual Code Snippets Training for Program Translation
نویسندگان
چکیده
Program translation aims to translate source code from one programming language another. It is particularly useful in applications such as multiple-platform adaptation and legacy migration. Traditional rule-based program methods usually rely on meticulous manual rule-crafting, which costly both terms of time effort. Recently, neural network based have been developed address this problem. However, the absence high-quality parallel data main bottlenecks impedes development models. In paper, we introduce CoST, a new multilingual Code Snippet Translation dataset that contains 7 commonly used languages. The at level snippets, provides much more fine-grained alignments between different languages than existing datasets. We also propose model leverages snippet denoising auto-encoding Multilingual (MuST) pre-training. Extensive experiments show training effective improving performance, especially for low-resource Moreover, our method shows good generalizability consistently improves performance number baseline proposed outperforms baselines snippet-level program-level translation, achieves state-of-the-art CodeXGLUE task. code, data, appendix paper can be found https://github.com/reddy-lab-code-research/MuST-CoST.
منابع مشابه
Interactive Synthesis of Code Snippets
We describe a tool that applies theorem proving technology to synthesize code fragments that use given library functions. To determine candidate code fragments, our approach takes into account polymorphic type constraints as well as test cases. Our tool interactively displays a ranked list of suggested code fragments that are appropriate for the current program point. We have found our system t...
متن کاملA Multilingual Framework for Searching Definitions on Web Snippets
This work presents Mdef-WQA, a system that searches for answers to definition questions in several languages on web snippets. For this purpose, Mdef-WQA biases the search engine in favour of some syntactic structures that often convey definitions. Once descriptive sentences are identified, Mdef-WQA clusters them by potential senses and presents the most relevant phrases of each potential sense ...
متن کاملDynamic and Interactive Synthesis of Code Snippets
Dynamic and Interactive Synthesis of Code Snippets
متن کاملTool for Fast Detection of Java Code Snippets
This paper presents general results on the Java source code snippet detection problem. We propose the tool which uses graph and subgraph isomorphism detection. A number of solutions for all of these tasks have been proposed in the literature. However, although that all these solutions are really fast, they compare just the constant static trees. Our solution offers to enter an input sample dyna...
متن کاملApplications in Multilingual Machine Translation Applications in Multilingual Machine Translation
The CAT2 Machine Translation System, developed in Saarbr ucken in 1987, is a natural language application coded entirely in Prolog. Since its initial development, several languages have been implemented on an experimental basis to evaluate the translation methodology, the underlying formalism, the linguistic descriptions, and the e ectiveness of the Prolog implementation. Seven years later, it...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i10.21434